2023-03-20
estimating the precision of sample statistics using subsets of data (e.g. jackknifing) or drawn randomly from a set of data points (e.g. bootstrapping)
Exchanging labels on data points when performing significance tests (e.g. permutation tests);
Validating models by using random subsets (e.g. repeated cross-validation, for example the tsCV() from previous lectures.
Particularly used to compute expected values (e.g. options payoff)
Including those meant for inference and estimation (e.g., Bayesian estimation stan_glm(), prophet() , simulated method of moments)
A set of methods used to approximate multivariate density functions from a set of datapoints; it is largely applied to generate smooth functions, reduce outliers effects and improve joint density estimations, sampling, and to derive non-linear fits.
A large class of nonlinear models widely used for inference and predictive modelling (e.g. time series forecasting, curve-fitting, prophet())
Regularisation methods are increasingly used as an alternative to traditional hypothesis testing and criteria-based methods, for allowing better quality forecasts with a large number of features.
This AI continuum of epistemological models spans three main communities
Knowledge-based or heuristic algorithms (e.g. rule-based) - where knowledge is explicitly represented as ontologies or IFTHEN rules rather than implicitly via code (Giarratano and Riley, 1998)
Evolutionary or metaheuristics algorithms - a family of algorithms for global optimization inspired by biological evolution, using population-based trial and error problem solvers with a metaheuristic or stochastic optimization character (e.g. Genetic Algorithms, Genetic Programming, etc.) (Poli et al., 2008; Brownlee, 2011)
Machine Learning algorithms - a type of AI program with the ability to learn without explicit programming, and can change when exposed to new data; mainly comprising Supervised (e.g. Support Vector Machines, Random Forest, etc.), Unsupervised (e.g. K-Means, Independent Component Analysis, etc.), and Reinforcement Learning (e.g. Q-Learning, Temporal Differences, Gradient Policy Search, etc.) (Hastie et al., 2009; Sutton and Barto, 2018).
A complex system is any system featuring a large number of interacting components (e.g. agents, processes, etc.) whose aggregate activity is nonlinear (not derivable from the summations of the activity of individual components) and typically exhibit hierarchical self-organization under selective pressures (Taylor, 2014; Barabási, 2016).
auto.arima()
- Select no. differences \(d\) and \(D\) via KPSS test and seasonal strength measure.
- Select \(p,q\) by minimising AICc.
- Use stepwise search to traverse model space.
auto.arima() work?\[\text{AICc} = -2 log(L) + 2(p+q+k+1)\left[1 + \frac{(p+q+k+2)}{T-p-q-k-2}\right].\]
where \(L\) is the maximised likelihood fitted to the differenced data,
\(k=1\) if \(c\neq 0\) and \(k=0\) otherwise.
Consider variations of current model:
Model with lowest AICc becomes current model.
Series: log(vix_ts)
ARIMA(7,1,0)
Coefficients:
ar1 ar2 ar3 ar4 ar5 ar6 ar7
-0.0874 -0.0437 -0.0174 -0.0596 -0.0180 -0.0499 -0.0390
s.e. 0.0197 0.0198 0.0198 0.0198 0.0198 0.0198 0.0197
sigma^2 = 0.006276: log likelihood = 2871.9
AIC=-5727.8 AICc=-5727.75 BIC=-5680.99
stepwise and approximation arguments to FALSE will slow the automation down but provides a more exhaustive search for the appropriate model.auto.arima function then searches over all possible models using MLE.help(auto.arima) for more details.Arima
Plot the data. Identify any unusual observations.
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
If the data are non-stationary: take first differences of the data until the data are stationary.
Examine the ACF/PACF: Is an AR(p) or MA(q) model appropriate?
Try your chosen model(s), and use the AICs to search for a better model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.
auto.arimaPlot the data. Identify any unusual observations.
If necessary, transform the data (using logs) to stabilize the variance.
Use auto.arima to select a model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.
tsfe::indices %>%
select(date,`RUSSELL 2000 - PRICE INDEX`) %>%
rename(r2000=`RUSSELL 2000 - PRICE INDEX`) %>%
drop_na() %>%
tq_transmute(select =r2000,mutate_fun = periodReturn,type='log') ->monthly_r2002r
ts(monthly_r2002r$monthly.returns, start = c(1988,1))->r2000r_m_ts
autoplot(r2000r_m_ts) + xlab("Year") +
ylab("monthly log returns")Series: r2000r_m_ts
ARIMA(3,0,1) with non-zero mean
Coefficients:
ar1 ar2 ar3 ma1 mean
1.0466 -0.0945 -0.0029 -1.0000 0.0062
s.e. 0.0509 0.0736 0.0511 0.0094 0.0004
sigma^2 = 0.002795: log likelihood = 586.84
AIC=-1161.68 AICc=-1161.46 BIC=-1137.96
\[y_t=1.047y_{t-1}-0.094y_{t-2}-0.003y_{t-3} +-1\varepsilon_{t-1}+0.008\]
The standard errors are 0.13, 0.06, 0.05, 0.12 and 0.002, respectively.
This suggest that only the AR1 and the constant (mean) are more than 2 SEs away from zero and thus statistically significant.
The significance of \(\phi_0\) of this entertained model implies that the expected mean return of the series is positive.
In fact \(\hat{\mu}=0.006/(1-(1.047-0.094-0.003)) =0.12\) which is small but has long term implications.
Using the multi-period return definition from the financial data lecture an annualised log return is simple \(\sum_1^{12} y_t\) \(\approx 1.44\) per annum.
\[\hat{y}_{T+h|T} \pm 1.96\sqrt{v_{T+h|T}}\]
where \(v_{T+h|T}\) is estimated forecast variance.
\[\displaystyle y_t = \varepsilon_t + \sum_{i=1}^q \theta_i \varepsilon_{t-i}\]
\[\displaystyle v_{T|T+h} = \hat{\sigma}^2 \left[ 1 + \sum_{i=1}^{h-1} \theta_i^2\right], \qquad\text{for~} h=2,3,\dots.\]
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
The procedure makes use of a decomposable time series model with three main model components: heatinline trend, seasonality, and holidays.
Similar GAM, with time as a predictor, Prophet fits several linear and non-linear functions of time as components. In its simplest form;
\[y(t) = g(t) + s(t) + h(t) + e(t)\]
- where \(g(t)\) trend models non-periodic changes (i.e. growth over time)
- \(s(t)\) seasonality presents periodic changes (i.e. weekly, monthly, yearly)
- \(h(t)\) ties in effects of holidays (on potentially irregular schedules ≥ 1 day(s))
- e(t) covers idiosyncratic changes not accommodated by the model
In other words, the procedure’s equation can be written;
Modeling seasonality as an additive component is the same approach taken by exponential smoothing… GAM formulation has the advantage that it decomposes easily and accommodates new components as necessary, for instance when a new source of seasonality is identified.
\[ g(t)=\frac{C}{1+exp(-k(t-m))}\]
where: C is the carrying capacity; k is the growth rate; m is an offset parameter
- There are two primary aspects of growth at Facebook (fluctuating carrying capacity and volatile rate of change) that are not captured in this simplified equation, though.
First, as with many scalable business models carrying capacity is not constant — as the number of people in the world who have access to the Internet increases, so does the growth ceiling.
Accounting for this is done by replacing the fixed capacity C with a time-varying capacity \(C(t)\).
Second, the market does not allow for stagnant technology.
Advances like those seen over the past decade in handheld devices, app development, and global connectivity, virtually ensure that growth rate is not constant.
Because this rate can quickly compound due to new products, the model must be able to incorporate a varying rate in order to fit historical data.
Roughly speaking a structural break change point is where some type of random shock changes the statistical properties of the time series, for example a permanent change in the mean value of the times series.
Suppose there are S changepoints at times \(s_j, j = 1,…,S\).
Prophet defines a vector of rate adjustments; \[\delta \in \R ^{S}\]
where: \(\delta_j\) is the change in rate that occurs at time \(s_j\)
The rate at any time t is then the base rate \(k\), plus adjustments up to that time;
\[k + \sum_{j:t >\delta_j}\delta_j\]
\[a(t) \in \{0,1\}^S\]
\[a_j(t)= \begin{cases}1,\text{if }t\ge s_j, \\ 0,\text{otherwise} \end{cases}\]
The rate at time t is then k+a(t)ᵀδ. When the rate k is adjusted, the offset parameter m must also be adjusted to connect the endpoints of the segments. The correct adjustment at changepoint j is easily computed as- Taylor et al., (2017)
\[\gamma_j=\left(s_j-m-\sum_{l<j}\gamma l \right) \left(1-\frac{k+\sum_{l<j}\delta_l}{k+\sum_{l \le j} \delta_l} \right)\]
\[g(t)=\frac{C(t)}{1+exp(-(k+a(t)^T\delta)(t-(m+a(t)^T\gamma)))}\]
An important set of parameters in our model is C(t), or the expected capacities of the system at any point in time. Analysts often have insight into market sizes and can set these accordingly. There may also be external data sources that can provide carrying capacities,such as population forecasts from the World Bank.-Taylor et al., (2017)
\[g(t)=(k+a(t)^T\delta)t+(m+a(t)^T\gamma)\]
where: k is the growth rate;δ has the rate adjustments; m is the offset parameter; and, to make the function continuous, γ_j is set to:
Automatic selection can be done quite naturally with the formulation in either model by putting a sparse prior on \(\delta\)
\[\delta_j \sim Laplace(0,\tau)\]
Critical note
A sparse prior on the adjustments \(\delta\) has no impact on the primary growth rate k, so as \(\tau\) progresses to 0 the fit reduces to standard (not-piecewise) logistic or linear growth.
When the model is extrapolated past the history to make a forecast, the trend g(t) will have a constant rate; the uncertainty in the forecast trend is estimated by extending the generative model forward.
The generative model for the trend is that there are;
S changepoints
over a history of T points
each of which has a rate change \(\delta_j \sim Laplace(0,\tau)\)
Simulation of future rate changes (that emulate those of the past) is achieved by replacing τ with a variance inferred from data.
- In a fully Bayesian framework this could be done with a hierarchical prior on τ to obtain its posterior, otherwise we can use the maximum likelihood estimate of the rate scale parameter:
\[\lambda=\frac{1}{S}\sum_{j=1}^S |\delta_j|\]
- Future changepoints are randomly sampled in such a way that the average frequency of changepoints matches that in the history:
\[\forall j>T, \begin{cases} \delta_j=0 \text{ w.p. } \frac{T-S}{T}, \\ \delta_j \sim \text{Laplace} (0,\lambda) \text{ w.p. } \frac{S}{T} \end{cases}\]
- Thus, uncertainty in the forecast trend is measured by assuming the future will see the same average frequency and magnitude of rate changes that were seen in the history.
- Once λ has been inferred from the data, this generative model is deployed to “simulate possible future trends and use the simulated trends to compute uncertainty intervals.”
Prophet’s assumption that the trend will continue to change with the same frequency and magnitude as it has in the history is fairly strong, so don’t bank on the uncertainty intervals having exact coverage.
As \(\tau\) is increased the model has more flexibility in fitting the history and so training error will drop.
Even so, when projected forward this flexibility is prone to produce wide intervals.
The uncertainty intervals are, however, a useful indication of the level of uncertainty, and especially an indicator of over fitting.
Business time series often have multi-period seasonality as a result of the human behaviors they represent. For instance, a 5-day work week can produce effects on a time series that repeat each week, while vacation schedules and school breaks can produce effects that repeat each year. To fit and forecast these effects we must specify seasonality models that are periodic functions of [time] t. - Taylor et al., (2017)
\[s(t)= \sum_{n=1}^N \left(a_n cos\left(\frac{2 \pi nt}{P}\right) + b_n sin\left(\frac{2 \pi nt}{P} \right) \right)\]
Fitting seasonality requires estimating the 2N parameters \(\beta = \left[a_1,b_1,\dots,a_N,b_N \right]^T\). This is done by constructing a matrix of seasonality vectors for each value of t in our historical and future data, for example with yearly seasonality and N= 10:
\[X(t)=\left[cos \left(\frac{2 \pi (1)t}{365.25}\right),\dots,sin\left(\frac{2 \pi (10)t}{365.25}\right) \right]\]
\[s(t)=X(t)\beta\]
In the generative model, Prophet takes \(\beta \sim Normal(0,\sigma^2)\) to impose a smoothing prior on the seasonality.
Truncating the series at N applies a low-pass filter to the seasonality, so, albeit with increased risk of overfitting, increasing N allows for fitting seasonal patterns that change more quickly.
For yearly and weekly seasonality we have found N = 10 and N = 3 respectively to work well for most problems.
The choice of these parameters could be automated using a model selection procedure such as AIC.
Impact of a particular holiday on the time series is often similar year after year, making it an important incorporation into the forecast.
The component \(h(t)\) speaks for predictable events of the year including those on irregular schedules (e.g. Easter).
To utilize this feature, the user needs to provide a custom list of events.
One simple way of including this list of holidays into the model is made straightforward by assuming that the effects of holidays are independent.
It is often important to include effects for a window of days around a particular holiday, such as the weekend of Thanksgiving. To account for that we include additional parameters for the days surrounding the holiday, essentially treating each of the days in the window around the holiday as a holiday itself. - Taylor et al., (2017)
Taylor SJ, Letham B. 2017. Forecasting at scale. PeerJ Preprints 5:e3190v2 https://doi.org/10.7287/peerj.preprints.3190v2
Barry Quinn CStat